Plasma proteomics-based biomarkers for predicting response to mesenchymal stem cell therapy in severe COVID-19

Background The objective of this study was to identify potential biomarkers for predicting response to MSC therapy by pre-MSC treatment plasma proteomic profile in severe COVID-19 in order to optimize treatment choice. Methods A total of 58 patients selected from our previous RCT cohort were enrolled in this study. MSC responders (n = 35) were defined as whose resolution of lung consolidation ≥ 51.99% (the median value for resolution of lung consolidation) from pre-MSC to 28 days post-MSC treatment, while non-responders (n = 23) were defined as whose resolution of lung consolidation < 51.99%. Plasma before MSC treatment was detected using data-independent acquisition (DIA) proteomics. Multivariate logistic regression analysis was used to identify pre-MSC treatment plasma proteomic biomarkers that might distinguish between responders and non-responders to MSC therapy. Results In total, 1101 proteins were identified in plasma. Compared with the non-responders, the responders had three upregulated proteins (CSPG2, CTRB1, and OSCAR) and 10 downregulated proteins (ANXA1, AGRG6, CAPG, DDX55, KV133, LEG10, OXSR1, PICAL, PTGDS, and S100A8) in plasma before MSC treatment. Using logistic regression model, lower levels of DDX55, AGRG6, PICAL, and ANXA1 and higher levels of CTRB1 pre-MSC treatment were predictors of responders to MSC therapy, with AUC of the ROC at 0.910 (95% CI 0.818–1.000) in the training set. In the validation set, AUC of the ROC was 0.767 (95% CI 0.459–1.000). Conclusions The responsiveness to MSC therapy appears to depend on baseline level of DDX55, AGRG6, PICAL, CTRB1, and ANXA1. Clinicians should take these factors into consideration when making decision to initiate MSC therapy in patients with severe COVID-19. Supplementary Information The online version contains supplementary material available at 10.1186/s13287-023-03573-4.


Background
Severe acute respiratory coronavirus type 2 (SARS-CoV-2) has caused the ongoing coronavirus disease 2019 (COVID-19) pandemic and has become a prominent public health event worldwide.As of March 2023, the number of infections had reached approximately 762 million, with over 6.8 million deaths [1].Among them, patients with severe or critical COVID-19 often have lung injury with poor prognosis.Characteristics of lung injury on chest CT appear as ground-glass opacity, consolidation, crazy-paving pattern, etc. Consolidation on chest CT refers to the replacement of alveolar air with pathological fluids, cells, or tissues, as indicated by an increase in pulmonary parenchymal density that obscures the borders of underlying arteries and airway walls [2].For patients with COVID-19, consolidation was also thought to be an indicator of disease progression [3].The prognosis of CT consolidation varies by individual, and some patients will develop post-acute fibrosis [4].The sequelae, such as lung injury, after recovery from acute COVID-19 are severe threats for survivors [5].New therapeutic strategies are vital for the treatment of COVID-19, especially for severe or critical patients.
Mesenchymal stem cells (MSC) have been shown to play therapeutic roles in acute lung injury, ARDS, and lung fibrosis, owing to their anti-inflammatory and immunomodulatory properties [6,7].As of March 2023, more than 90 clinical studies of MSC for COVID-19 have been registered at clinicaltrials.gov[8,9].Although preliminary clinical trial data have demonstrated good safety and encouraging efficacy of MSC for COVID-19, some patients still fail to respond to MSC therapy.Clinical studies must, therefore, establish strategies to identify biological characteristics of these potential responders to MSC therapy in order to optimize treatment choice.
Plasma has a dynamic range of protein abundance that exceeds ten orders of magnitude, making it an excellent source of biological information [10].Mass spectrometry (MS) is a powerful tool for analyzing intact protein from plasma.Systematic omics studies, such as proteomics, provide a comprehensive understanding of biological processes in patients and facilitate in the discovery of biomarkers [11][12][13].
In March 2020, we performed a multicenter, randomized, double-blind phase II study of MSC therapy in patients with severe COVID-19 (ClinicalTrials.gov:NCT04288102).Findings showed that MSC was a potentially effective therapeutic approach for patients with severe COVID-19 [14,15], but not all patients responded well.To further understand the proteomic biomarkers of response to MSC therapy in patients with severe COVID-19, we characterized the proteomic profile of 58 patients pre-MSC treatment plasma samples (35 responders versus 23 non-responders) by using proteomics assays.The present study aimed to identify potential biomarkers for predicting response to MSC therapy by pre-MSC treatment plasma proteomic profile in severe COVID-19.

Study design
This prospective longitudinal cohort study aimed to identify potential proteomic biomarkers pre-MSC treatment for predicting response to MSC therapy in severe COVID-19.Patients and data were from MSC group of our previous randomized trial, which enrolled between March 6, 2020, and March 20, 2020, at two hospitals in Wuhan, Hubei, China (NCT04288102).First, we identified and compared the differentially expressed proteins between the responders and non-responders.The proteomic data from this cohort were randomly divided into training and validation sets with a 4:1 ratio to build a prediction model and then evaluate with receiver operating characteristic (ROC) methods.

Participants and groups
As previously described [14,15], 66 patients in the MSC treatment group and 35 patients in the placebo group were enrolled in our previous multicenter, randomized, double-blind phase II trial of MSC therapy in patients with severe COVID-19.Inclusion criteria were hospitalized patients with severe COVID-19 confirmed by real-time reverse transcription PCR assay, either man or woman, aged 18-75 years old.Patients had pneumonia combined with lung damage confirmed by chest CT.Exclusion criteria were shock, organ failures, invasive ventilation, malignant tumor, pregnancy, lactation, or co-infection of other pathogens.Human umbilical cord MSCs (with each dose of 4.0 × 10 7 cells) or placebo were infused for severe COVID-19 patients intravenously three times at 3-day intervals.The endpoint was the percentage decline rates of consolidation volume to whole lung volume on high-resolution chest computed tomography (HRCT) from pre-MSC treatment to 28 days post-MSC treatment, which measured by centralized imaging interpretation based on imaging software (LIAIS).LIAISassisted lung volumetry and densitometry procedure was as described in our previous report [14].Briefly, consolidation volume was segmented automatically after importing raw CT images to the software.Subsequently, segmentation was corrected by three independent reviewers on the software platform.The percentage decline rates of consolidation volume were defined as (consolidation proportion of the whole lung volume at day 28-consolidation proportion of the whole lung volume pre-MSC treatment)/(consolidation proportion of the whole lung volume pre-MSC treatment).
In this study, patients were selected from the 66 patients of the MSC group.Of which, one patient withdrew consent, three patients had no plasma pre-MSC treatment, and four patients lost to follow-up at day 28.Eventually, a total of 58 patients were included in this study (Fig. 1).The previous results showed that MSC significantly promoted the resolution of lung consolidation at day 28, but not for all patients.The median value for resolution of lung consolidation was 51.99%.Therefore, in this study, we used the median value of resolution of lung consolidation for all enrolled patients as the cutoff value and classified the patients into responders (n = 35) and non-responders (n = 23).MSC responders were defined as whose resolution of lung consolidation ≥ 51.99%, while non-responders were defined as whose resolution of lung consolidation < 51.99%.
Ethical approval was obtained from the Ethics Committee of the Fifth Medical Center, Chinese PLA General Hospital (2020-013-D).Written informed consent was obtained from all enrolled patients.Demographic information, clinical characteristics, laboratory examinations, and CT results were obtained using the electronic data capture system.

Samples
On the morning prior to MSC treatment, peripheral blood samples were collected on the median cubital vein by venipuncture in 10-mL vacutainer tubes containing EDTA, subsequently mixed well by using gentle inversion.Plasma was separated by centrifugation at 2500g for 10 min at 4 °C and stored at − 80 °C for further proteomic analysis.

Liquid chromatography with tandem mass spectrometry (LC-MS/MS)
Proteomics assays were performed using the data-independent acquisition (DIA) method.Total proteins in plasma were extracted, denatured, and digested into peptides using trypsin.A Orbitrap Exploris 480 mass spectrometer was used in conjunction with an Easy-nLC 1200 system to collect LC-MS/MS data.Peptides were separated on a C18 analytical column (75 μm × 25 cm, C18, 1.9 μm, 100 A), and the gradient was established using mobile phase A (0.1% formic acid) and mobile phase B (80% ACN, 0.1% formic acid) at a flow rate of 300 nL/ min.Each scan cycle for DIA mode analysis includes one full-scan mass spectrum (R = 60 K, AGC = 3e6, Max IT = 30 ms, scan range = 350-1250 m/z) followed by 40 variable MS/MS events (R = 30 K, AGC = 1000%, Max The library-free MS data analysis was processed using DIA-NN software (version 1.8).Deep learning methods were used to predict libraries using the SwissProt human protein sequence database.A spectrum library constructed from DIA data using the MBR function was used for data reanalysis, with a final precursor and protein FDR of 1%.The DIA-NN output files providing quantification information for the protein groups were subsequently utilized.Proteins detected in more than 50% of the samples in at least one category were first grouped and filtered.Missing values were imputed using values from a normal distribution around the detection limit of the mass spectrometer.The mean and standard deviation of the real intensity distribution were calculated, and a new distribution with a 1.8-standard deviation downshift and 0.25-standard deviation width was generated.These values were used to impute the entire matrix, allowing statistical analysis.

Differentially expressed proteins and functional annotation
Differentially expressed proteins (DEPs) were defined as those with a fold change (FC) > 1.5 or FC < 1/1.5 and p < 0.05 between the responders and non-responders.The "clusterProfiler" package was adopted to execute enrichment analysis in DEPs, including the Kyoto Encyclopedia of Genes and Genomes (KEGG) assessment and gene ontology (GO) assessment (biological process (BP), molecular function (MF), and cellular component (CC)).

Least absolute shrinkage and selection operator regression analysis
To optimize the latent collinearity and avoid over-fitting of variables, the least absolute shrinkage and selection operator (LASSO) regression analysis was subsequently used to further screen the most significant proteins using the R software package "glmnet."

Multivariate logistic regression analysis
LASSO-selected proteins were subsequently included in the multivariate logistic regression analysis.Multivariate logistic regression analysis was used to identify pre-MSC treatment plasma proteomic biomarkers that might distinguish between responders and non-responders to MSC therapy at 28 days.Fifty-eight patients were divided into a training set (n = 47) and a validation set (n = 11) in a 4:1 ratio using the stratified random sampling method with R caret package [16,17].Subsequently, a multiple biomarker panel was developed using the training set data.Fivefold cross-validation was used in the training set.Finally, the performance of the multiple biomarker panel was verified using the validation set (Fig. 2).

Statistical analysis
Continuous variables were presented as mean with standard deviation and analyzed using an unpaired Student's t-test for normally distributed variables or median with interquartile range (IQR) and analyzed using a Mann-Whitney U-test for skewed data.Categorical variables were expressed as absolute numbers and percentages and were analyzed using the Chi-square or For all tests, a two-tailed p < 0.05 was considered statistically significant.Data were analyzed using R (version 4.1.2) and Python (version 3.9.0).

Results
Baseline none of the patients died, and all nucleic acid tests were negative.

Differentially expressed proteins in plasma before MSC treatment between responders and non-responders
In order to obtain a comprehensive understanding of plasma protein levels in enrolled patients, proteomic analysis was performed.A total of 1101 proteins were identified by proteomics analysis of the plasma in the patients.As shown in Additional file 1: Fig. S1, the distribution and quality control of the total peptides and proteins detected indicated that the proteomics data were of good quality and reproducibility.Subsequently, a comparison of plasma protein levels was performed between the responders and non-responders.Proteome screening identified 13 differentially expressed proteins in plasma with significant FC and p values between the responders and non-responders (Table 2 and Fig. 3A).As shown in Fig. 3B, the responders had three upregulated proteins (CSPG2, CTRB1, and OSCAR) and 10 downregulated proteins (ANXA1, AGRG6, CAPG, DDX55, KV133, LEG10, OXSR1, PICAL, PTGDS, and S100A8) compared with that in the non-responders.The subsequent GO BP analysis of these 13 proteins indicated that their functions were mainly enriched in multicellular organism development, system development, immune system process, and nervous system development sorted according to the gene ratio and p values (Fig. 3C).GO CC analysis revealed significant enrichment in the extrinsic components of the membrane and plasma membrane.GO MF analysis revealed significant enrichment in the anion, lipid, and phospholipid binding.The KEGG enrichment was primarily related to arachidonic acid metabolism.These 13 proteins were then subjected to LASSO regression analysis, among which 10 were chosen (Fig. 4), with the names and coefficients of the proteins shown in Additional file 2: Table S1.

Potential proteomic biomarkers for predicting response to MSC therapy
To construct a concise prediction biomarker panel, no more than five random combinations of the 10 proteins screened by LASSO regression were chosen as candidates for predictive model.Based on the training set, an ideal panel was defined as a combination of DDX55, AGRG6, PICAL, CTRB1, and ANXA1.Finally, multivariate logistic regression identified AGRG6, PICAL, CTRB1, and ANXA1 as independent factors that predicted response to MSC therapy in patients with severe COVID-19.Low levels of AGRG6, PICAL, and ANXA1, as well as elevated levels of CTRB1, were associated with responsiveness to MSC therapy (Table 3).Figure 5A shows the predictive nomogram with weights and points.The calibration plot revealed the best agreement between the nomogram predictions and actual observations (Fig. 5B).The area under the curve (AUC) of the ROC for the training set of responders vs. non-responders was 0.910 (95% CI 0.818-1.000),for the validation set 0.767 (95% CI 0.459-1.000) Table 2 Up-/down-regulated differentially expressed proteins (DEPs) between the responders and non-responders pre-MSC treatment Differentially expressed proteins (DEPs) are defined as those proteins with a fold change (FC) > 1.5 or FC < 1/1.5 and p < 0.05 between the responders and nonresponders.Upregulated proteins in the responders compared with that in the non-responders, fold change (FC) > 1.5 and p < 0.05.Downregulated proteins in the responders compared with that in the non-responders, FC < 1/1.(Fig. 5C and D), indicating that this model was accurate to predict responsiveness.

Discussion
MSC transfusion has been found to improve the restoration of lung injury in severe cases with COVID-19; however, only some of the treated COVID-19 patients responded well to MSC treatment [14].In this context, predicting the response to MSC and selecting potential responders before MSC therapy is highly desirable.The mechanism of MSC treatment may involve multiple targets comprising anti-inflammatory, immunomodulatory, and regeneration.MS examines the proteome as a whole.Therefore, we believe that a systematic proteomic approach is necessary to identify multiple biomarkers.
Our results showed that low levels of DDX55, ANXA1, PICAL, and AGRG6 before treatment, as well as elevated levels of CTRB1 pre-MSC treatment, were biomarkers for predicting response to MSC therapy in severe COVID-19 patients.Therefore, the assessment of such proteins before MSC may be a new promising approach to identify potential respondents and give them a better chance of attaining full recovery after MSC therapy.
In this study, we tried to develop a predictive model using five selected proteomic markers and found that this model could discriminate responders from non-responders.Several previous prognostic models for COVID-19 have mostly been used to predict multiple outcomes after stand care, including disease progression, acute respiratory distress syndrome, admission to ICU, death, duration of mechanical ventilation, complications of cardiac injury, and thrombosis [18].However, none of the predictive models used to guide patients will benefit from MSC treatment in COVID-19 before this study.Since this reason, we cannot know whether the sensitivity and specificity of this model were superior to others.Notably, our results indicated that compared to just one marker (DDX55, ANXA1, PICAL, AGRG6, or CTRB1), the discrimination potential of the five markers panel was found to be superior.
The associated mechanisms responsible for the decreased levels of DDX55, ANXA1, PICAL, and AGRG6 before treatment and the elevated level of CTRB1 pre-MSC treatment with responsiveness to MSC therapy are still unclear.As the above-mentioned proteins play a role in physiological and pathological functions, it is now recognized that the investigation of proteins function is essential to obtain an accurate understanding of disease pathogenesis.ANXA1, also known as lipocortin-1, is a member of a family of proteins that bind to membrane phospholipids, resulting in the inhibition of phospholipase A2 and eicosanoid production [19].It is an endogenous suppression modulator of inflammation expressed in monocytes and neutrophils [20].ANXA1 is reportedly involved in SARS-CoV-2 infection, especially in patients with severe disease, through interactions with complement molecules and lipids, mediating a systemic cytokine storm [21,22].A case-control study demonstrated that Annexin A1 is a potential prognostic biomarker in the diagnosis of COVID-19 pneumonia and in predicting the need for ICU treatment in patients with COVID-19 [23].In this study, we found that patients with low ANXA1 levels were more likely to respond to MSC treatment and have a better prognosis.This may be because patients with low ANXA1 levels before treatment have weak interactions with complement molecules and lipids, mediating a low cytokine response in COVID-19.The level of ANXA1 is regulated by exogenous drugs and cells such as ingested glucocorticoids [24][25][26] and MSC paracrine [27].PICAL is a cytoplasmic adapter protein that plays a critical role in clathrin-mediated endocytosis and is found in the nasopharynx, bronchial, and lung tissues.The depletion of this receptor has previously been shown to inhibit clathrin-mediated endocytosis [28].In  COVID-19, it has been established that both clathrinmediated and clathrin-/caveolae-independent endocytosis are an essential mechanism for the internalization of SARS-CoV [29].AGRG6/Adgrg6, also named GPR126, is a member of the adhesion G-protein-coupled receptor family of proteins involved in cell adhesion and signaling.Although the precise function of this protein has not been elucidated, its expression is highest in the adult lungs [30].It has been reported that AGRG6 polymorphisms are associated with FEV1/FVC at genomewide significance [31].At present, studies on CTRB1 are mainly focused on pancreas-related diseases, and its  role in lung diseases needs to be further explored.We believe that the elucidation of the underlying mechanism between these proteins and response to MSC therapy in COVID-19 might also provide potential targets for new therapeutic strategies.The dynamic changes in the plasma proteins after MSC infusion, including these five proteomic markers, are of great interest.MSC infusion reduced the levels of TGFβ, TNF-α, type I and III collagen, C-reactive protein, and neutrophil extracellular traps, while increased IDO, PGE2, IL-10, IL-4, TGF-α, VEGF, FGF, HGF, and KGF [32,33].Further study will be necessary.Particularly, it would be interesting to see whether the dynamic change of these proteins has enhanced the performance of prediction models for responsiveness to MSC treatment.
There were several limitations to this study.First, the sample size was small, and a larger sample size is necessary for further validation.Second, an internal cohort was used for model validation.These findings must be further verified in external validation cohorts before they can be used in clinical settings.

Conclusions
The responsiveness to MSC therapy appears to depend on baseline level of DDX55, AGRG6, PICAL, CTRB1, and ANXA1.Clinicians should take these factors into consideration when making a decision to initiate MSC therapy in patients with severe COVID-19.In the future, external validation cohorts and large prospective studies are needed to confirm the preliminary findings of this study.It is also necessary to gain a better understanding of the longitudinal dynamics of plasma proteomic markers during MSC intervention.

Fig. 1
Fig. 1 Participants in this study

Fig. 3
Fig.3 Thirteen differentially expressed proteins (DEPs) and their functional enrichment analysis between the responders and non-responders pre-MSC treatment.A Heatmap showing 13 differentially expressed proteins between the responders and non-responders.B The red dots represent the upregulated DEPs based on p < 0.05 and fold change (FC) > 1.5, and the blue dots represent downregulated DEPs based on p < 0.05 and FC < 1/1.5.The gray spots represent proteins with no significant difference.C GO-based enrichment analysis of DEPs in terms of biological processes (hypergeometric test; p < 0.05).GO terms were sorted according to p

Fig. 4
Fig.4 Selection of 10 proteins using LASSO binary logistic regression model between the responders and non-responders from 13 DEPs.A LASSO coefficient profiles of the 13 proteins.Vertical line was drawn at the value selected using fivefold cross-validation, where optimal λ resulted in 10 nonzero coefficients.B Tuning parameter (λ) selection in the LASSO model used fivefold cross-validation via minimum partial likelihood deviance.Abbreviations: LASSO, least absolute shrinkage and selection operator

Fig. 5
Fig.5 Related biomarkers of lung pathological recovery in severe COVID-19 patients in the MSC group.A Nomogram of five protein combinations to predict the percentage decline rate of consolidation ≤ − 51.99% after MSC therapy (28-day good outcome).To use the nomogram, the protein expression level of an individual value is located on each variable axis, and a line is drawn upward to determine the number of points received for each variable protein expression level.The sum of these numbers is located on the total points axis.A line is drawn downward to the 28-day good outcome probability axes to determine the probability of the percentage decline rate of consolidation ≤ − 51.99% after MSC therapy.B Calibration plot and confusion matrix to assess the accuracy of the model in 58 patients.C The receiver operating characteristic (ROC) curve and confusion matrix to distinguish responders from non-responders outcomes in the training set.The ROC curves were created by plotting the sensitivity (i.e., true-positive rate) against 1 − specificity (i.e., false-positive rate).The line in each plot represents the area under the curve (AUC).D ROC curve and confusion matrix to distinguish responders from non-responders outcomes after MSC therapy in the validation set-1

Table 1
Baseline demographic and clinical characteristics in the MSC group Values are n (%), mean (SD), or median [IQR] for skewed data BMI body mass index, IL-6 interleukin-6, and RBC red blood cell count Training set (n = 47) Validation set (n = 11)

Table 3
Final model of the multivariate logistic regression analysis in the training set